Deep learning for word-level handwritten Indic script identification

نویسندگان

  • Soumya Ukil
  • Swarnendu Ghosh
  • Sk Md Obaidullah
  • K. C. Santosh
  • Kaushik Roy
  • Nibaran Das
چکیده

We propose a novel method that uses convolutional neural networks (CNNs) for feature extraction. Not just limited to conventional spatial domain representation, we use multilevel 2D discrete Haar wavelet transform, where image representations are scaled to a variety of different sizes. These are then used to train different CNNs to select features. To be precise, we use 10 different CNNs that select a set of 10240 features, i.e. 1024/CNN. With this, 11 different handwritten scripts are identified, where 1K words per script are used. In our test, we have achieved the maximum script identification rate of 94.73% using multi-layer perceptron (MLP). Our results outperform the state-of-the-art techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network

In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method uses a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities j...

متن کامل

Recurrent neural networks based Indic word-wise script identification using character-wise training

This paper presents a novel methodology of Indic handwritten script recognition using Recurrent Neural Networks and addresses the problem of script recognition in poor data scenarios, such as when only character level online data is available. It is based on the hypothesis that curves of online character data comprise sufficient information for prediction at the word level. Online character dat...

متن کامل

Word level Script Identification from Bangla and Devanagri Handwritten Texts mixed with Roman Script

India is a multi-lingual country where Roman script is often used alongside different Indic scripts in a text document. To develop a script specific handwritten Optical Character Recognition (OCR) system, it is therefore necessary to identify the scripts of handwritten text correctly. In this paper, we present a system, which automatically separates the scripts of handwritten words from a docum...

متن کامل

A new dataset of word-level offline handwritten numeral images from four official Indic scripts and its benchmarking using image transform fusion

Handwritten document image dataset development is one of the most tedious and time consuming tasks in optical character recogniser (OCR) related experimental work. Special attention need to be given in terms of feasibility, realness, clarity etc. while collecting real life data from different writers. Few efforts can be found in the literature for development of handwritten NIdb (numeral image ...

متن کامل

Cross-language Framework for Word Recognition and Spotting of Indic Scripts

Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available scr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.01627  شماره 

صفحات  -

تاریخ انتشار 2018